Putting Visual Analytics into Practical Use
In this take-home exercise, I will be attempting to analyze the traffic bottlenecks of the city of Engagement, Ohio USA.
Before I get started, it is important for us to ensure that the required R packages have been installed. If yes, we will load the R packages. If they have yet to be installed, I will install the R packages and load them onto R environment.
packages = c("tidyverse","sf","tmap","lubridate","clock","sftime","rmarkdown","plotly")
for(p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p, character.only = T)
}
The code chunk below imports ParticipantStatusLogs1.csv from
the data folder, into R by using read_csv()
of readr and
save it as an tibble dataframe called participant_log1.
This imported dataset is just 1 of the 72 ParticipantStatusLogs[n].csv that records all the 1000+ participants across a 15-month period.
buildings <- read_sf("data/wkt/Buildings.csv",
options = "GEOM_POSSIBLE_NAMES=location")
employers <- read_sf("data/wkt/Employers.csv",
options = "GEOM_POSSIBLE_NAMES=location")
apartments <- read_sf("data/wkt/Apartments.csv",
options = "GEOM_POSSIBLE_NAMES=location")
schools <- read_sf("data/wkt/Schools.csv",
options = "GEOM_POSSIBLE_NAMES=location")
pubs <- read_sf("data/wkt/Pubs.csv",
options = "GEOM_POSSIBLE_NAMES=location")
restaurants <- read_sf("data/wkt/Restaurants.csv",
options = "GEOM_POSSIBLE_NAMES=location")
With the code chunk below, I was able to colour code the respective building types with the metadata “buildingType” from the Buildings.csv.
tmap_mode("plot")
tmap_style("white")
tm_shape(buildings) +
tm_polygons(col = "buildingType",
palette="plasma",
border.col = "black",
border.alpha = .5,
border.lwd = 0.5,
title = "Building Type") +
tm_layout(main.title = 'Composite Map of Engagement City, Ohio USA',
frame = FALSE)
From the composite map plotted, it seems like the City can be split into 3 distinct regions,
As shown with respect to the plotted map, it is clear that residential areas (pink) are situated mostly at the edges/corners of the city, while the commercial areas (blue) are very much in the center of this city, with smaller congregation of them in the top left and bottom right corner of the city.
It is also noted that the 4 schools (yellow) are located in the,
On top of the initial composite map done earlier, additional markers of various location types are added too.
tmap_mode("plot")
tmap_style("white")
tm_shape(buildings) +
tm_polygons(col = "buildingType",
palette="plasma",
border.col = "black",
border.alpha = .5,
border.lwd = 0.5,
title = "Building Type") +
tm_shape(employers) +
tm_dots(col = "blue", size = 0.3, alpha= 0.7) +
tm_shape(apartments) +
tm_dots(col = "red", size = 0.3, alpha= 0.7) +
tm_shape(schools) +
tm_dots(col = "green", size = 1, alpha= 0.7) +
tm_shape(pubs) +
tm_dots(col = "yellow", size = 0.7, alpha= 0.7) +
tm_shape(restaurants) +
tm_dots(col = "cyan", size = 0.7, alpha= 0.7) +
tm_add_legend(title = 'Location Types',
type = 'symbol',
border.col = NA,
labels = c('Employers', 'Apartments', 'Schools', 'Pubs', 'Restaurants'),
col = c('blue', 'red', "green", 'yellow', 'cyan')) +
tm_layout(main.title = 'Composite Map of Engagement City, Ohio USA',
frame = FALSE)
Here in this tmap output, we can now visualize where the schools, pubs and restaurants are.
Multiple ParticipantStatusLogs[n].csv files were read in with read_sf
of the sf
package.
logs1 <- read_sf("data/wkt/ParticipantStatusLogs1.csv",
options = "GEOM_POSSIBLE_NAMES=currentLocation")
logs2 <- read_sf("data/wkt/ParticipantStatusLogs2.csv",
options = "GEOM_POSSIBLE_NAMES=currentLocation")
logs3 <- read_sf("data/wkt/ParticipantStatusLogs3.csv",
options = "GEOM_POSSIBLE_NAMES=currentLocation")
logs4 <- read_sf("data/wkt/ParticipantStatusLogs4.csv",
options = "GEOM_POSSIBLE_NAMES=currentLocation")
logs5 <- read_sf("data/wkt/ParticipantStatusLogs5.csv",
options = "GEOM_POSSIBLE_NAMES=currentLocation")
logs6 <- read_sf("data/wkt/ParticipantStatusLogs6.csv",
options = "GEOM_POSSIBLE_NAMES=currentLocation")
Subsequently, the respective imported ParticipantStatusLogs were wrangled and cleaned,
logs1_selected <- logs1 %>%
mutate(Timestamp = date_time_parse(timestamp,
zone="",
format="%Y-%m-%dT%H:%M:%S")) %>%
mutate(day = get_day(Timestamp)) %>%
filter(currentMode == "Transport") %>%
select(currentLocation,Timestamp)
logs2_selected <- logs2 %>%
mutate(Timestamp = date_time_parse(timestamp,
zone="",
format="%Y-%m-%dT%H:%M:%S")) %>%
mutate(day = get_day(Timestamp)) %>%
filter(currentMode == "Transport") %>%
select(currentLocation,Timestamp)
logs3_selected <- logs3 %>%
mutate(Timestamp = date_time_parse(timestamp,
zone="",
format="%Y-%m-%dT%H:%M:%S")) %>%
mutate(day = get_day(Timestamp)) %>%
filter(currentMode == "Transport") %>%
select(currentLocation,Timestamp)
logs4_selected <- logs4 %>%
mutate(Timestamp = date_time_parse(timestamp,
zone="",
format="%Y-%m-%dT%H:%M:%S")) %>%
mutate(day = get_day(Timestamp)) %>%
filter(currentMode == "Transport") %>%
select(currentLocation,Timestamp)
logs5_selected <- logs5 %>%
mutate(Timestamp = date_time_parse(timestamp,
zone="",
format="%Y-%m-%dT%H:%M:%S")) %>%
mutate(day = get_day(Timestamp)) %>%
filter(currentMode == "Transport") %>%
select(currentLocation,Timestamp)
logs6_selected <- logs6 %>%
mutate(Timestamp = date_time_parse(timestamp,
zone="",
format="%Y-%m-%dT%H:%M:%S")) %>%
mutate(day = get_day(Timestamp)) %>%
filter(currentMode == "Transport") %>%
select(currentLocation,Timestamp)
For ParticipantStatusLog6.csv, an additional filter was applied to remove records that are in Apr 2022.
logs6_selected <- logs6_selected %>%
filter(Timestamp < '2022-04-01 00:00:00')
After cleaning the logs, they are appended with ‘rbind’ function into 1 single sf dataframe.
transport_log_mar_22 <- rbind(logs1_selected,logs2_selected,logs3_selected,logs4_selected,logs5_selected,logs6_selected)
Now that we have the transport logs tabulated, we will need to prepare the hexagon map as the base layer for visualization of the traffic density.
hex <- st_make_grid(buildings,
cellsize=100,
square=FALSE) %>%
st_sf() %>%
rowid_to_column('hex_id')
Next, we will use st_join
to join the transport log’s position count data into the hex grid that
was created in the previous step, in order to assign a hex_id to the
respective position with their count value.
points_in_hex_mar_22 <- st_join(transport_log_mar_22,
hex,
join=st_within) %>%
st_set_geometry(NULL) %>%
count(name='pointCount', hex_id)
The following step will then join the 2 tables (hex_grid and the previous table that has the respective counts in their hex_id assigned) on similar hex_id to get the cleaned dataframe that can then be used for tmap plotting.
The code chunk below will then plot the Hexagon Binning Map with the March 2022 data that we have extracted and cleaned earlier. A noticeable parameter edit is the “breaks” assigned to the bins written, instead of specifying the number of bins (n=“X number”) and using style = “quantile”.
Rationale is for better comparison with December 2022 data later on in the article.
hexmap1 <- tm_shape(hex_combined_mar_22 %>%
filter(pointCount > 0)) +
tm_fill("pointCount",
breaks = c(0, 100, 500, 1000, 5000, 7000, 10000),
style = "fixed",
title = 'Traffic Count') +
tm_borders(alpha = 0.1) +
tm_layout(main.title = 'Hexagon Binning Map of Engagement City\nMarch 2022',
frame = FALSE)
hexmap1
As seen from the hexagon binning map, it is obvious that the route that has high traffic (dense) are the areas/paths that links up the whole map, which is understandable as these are routes that are unavoidable if the participants wants/needs to travel to the different sectors of the city.
Now that we have looked at March 2022’s traffic data via the hexagon binning map visualization, how about the data from December 2022? Being a year-end month, the initial guess is it should see a lower traffic with the assumption that more people will be on leave and vacations, hence less movement in the city. Will it still look similar to the one we see in March 2022?
Hence, the following code chunks will perform the same steps (as what was done for March 2022 data) to extract and cleaned the data from December 2022.
logs44 <- read_sf("data/wkt/ParticipantStatusLogs44.csv",
options = "GEOM_POSSIBLE_NAMES=currentLocation")
logs45 <- read_sf("data/wkt/ParticipantStatusLogs45.csv",
options = "GEOM_POSSIBLE_NAMES=currentLocation")
logs46 <- read_sf("data/wkt/ParticipantStatusLogs46.csv",
options = "GEOM_POSSIBLE_NAMES=currentLocation")
logs47 <- read_sf("data/wkt/ParticipantStatusLogs47.csv",
options = "GEOM_POSSIBLE_NAMES=currentLocation")
logs48 <- read_sf("data/wkt/ParticipantStatusLogs48.csv",
options = "GEOM_POSSIBLE_NAMES=currentLocation")
logs49 <- read_sf("data/wkt/ParticipantStatusLogs49.csv",
options = "GEOM_POSSIBLE_NAMES=currentLocation")
logs44_selected <- logs44 %>%
mutate(Timestamp = date_time_parse(timestamp,
zone="",
format="%Y-%m-%dT%H:%M:%S")) %>%
mutate(day = get_day(Timestamp)) %>%
filter(currentMode == "Transport") %>%
select(currentLocation,Timestamp)
logs45_selected <- logs45 %>%
mutate(Timestamp = date_time_parse(timestamp,
zone="",
format="%Y-%m-%dT%H:%M:%S")) %>%
mutate(day = get_day(Timestamp)) %>%
filter(currentMode == "Transport") %>%
select(currentLocation,Timestamp)
logs46_selected <- logs46 %>%
mutate(Timestamp = date_time_parse(timestamp,
zone="",
format="%Y-%m-%dT%H:%M:%S")) %>%
mutate(day = get_day(Timestamp)) %>%
filter(currentMode == "Transport") %>%
select(currentLocation,Timestamp)
logs47_selected <- logs47 %>%
mutate(Timestamp = date_time_parse(timestamp,
zone="",
format="%Y-%m-%dT%H:%M:%S")) %>%
mutate(day = get_day(Timestamp)) %>%
filter(currentMode == "Transport") %>%
select(currentLocation,Timestamp)
logs48_selected <- logs48 %>%
mutate(Timestamp = date_time_parse(timestamp,
zone="",
format="%Y-%m-%dT%H:%M:%S")) %>%
mutate(day = get_day(Timestamp)) %>%
filter(currentMode == "Transport") %>%
select(currentLocation,Timestamp)
logs49_selected <- logs49 %>%
mutate(Timestamp = date_time_parse(timestamp,
zone="",
format="%Y-%m-%dT%H:%M:%S")) %>%
mutate(day = get_day(Timestamp)) %>%
filter(currentMode == "Transport") %>%
select(currentLocation,Timestamp)
For ParticipantStatusLog44 and 49, an additional filter was applied to ensure only records within December 2022 are kept.
transport_log_dec_22 <- rbind(logs44_selected,logs45_selected,logs46_selected,logs47_selected,logs48_selected,logs49_selected)
points_in_hex_dec_22 <- st_join(transport_log_dec_22,
hex,
join=st_within) %>%
st_set_geometry(NULL) %>%
count(name='pointCount', hex_id)
# head(points_in_hex)
hexmap2 <- tm_shape(hex_combined_dec_22 %>%
filter(pointCount > 0)) +
tm_fill("pointCount",
breaks = c(0, 100, 500, 1000, 5000, 7000, 10000),
style = "fixed",
title = 'Traffic Count') +
tm_borders(alpha = 0.1) +
tm_layout(main.title = 'Hexagon Binning Map of Engagement City\nDecember 2022',
frame = FALSE)
hexmap2
Similar to March 2022’s Hexagon Binning Map, the locations that experience high traffic are the areas/paths the essential routes that link up the whole city.
The reason why the Traffic Count bins were manually assigned was because after plotting the maps with the same number of bins in “quantile” style, I realized that the comparison may not be fair as the traffic count values in December were much lower than the ones in March, hence the size/range of each bin was different for both months.
Hence, the manual fixed bin interval assigned instead.
tmap_arrange(hexmap1, hexmap2)
While the previous comparison does show some slight difference in the traffic condition between March and December 2022, it is not obvious to the viewer where are the notable traffic changes in the city.
Hence, the code chunk below will be calculating the delta difference between these 2 months, with the December’s pointCount minus March’s pointCount.
points_in_hex_delta <- merge(points_in_hex_mar_22 %>% as.data.frame(),
points_in_hex_dec_22 %>% as.data.frame(),
by="hex_id",
all = TRUE) %>%
replace(is.na(.), 0) %>%
mutate(pointCount = as.integer(pointCount.y - pointCount.x)) %>%
select(hex_id, pointCount)
head(points_in_hex_delta)
hex_id pointCount
1 169 -38
2 212 -75
3 225 -37
4 226 -50
5 227 -21
6 228 -25
A positive delta count would mean an increase in traffic count from March’s to December’s. Likewise, a negative delta count would mean a decrease in traffic count from March’s to December’s.
The subsequent code chunk will then plot the delta data in
scatterplot with plotly
package.
points_in_hex_delta$hex_id = as.factor(points_in_hex_delta$hex_id)
p1 <- ggplot(data=points_in_hex_delta, mapping=aes(x=hex_id, y=pointCount)) +
geom_point()
ggplotly(p1)
A quick look at the scatterplot would tell us that there are much more negative changes (meaning lesser traffic) in December compared to the locations with positive change (meaning increase in traffic).
Example, for hex_id 4255, the delta change was as much as -4140, while hex_id 4473’s delta change was only +136.
My idea was to plot out these changes on the map for easier visualization. However, it would likely clutter the map if I were to use all the data points in the delta dataframe.
Hence, some filtering was done to cherry pick the notable deltas.
The code chunk below will plot out the notable delta points onto the Composite Map. Note that the employers, apartments and schools (tm_dots) were not plotted as they do not bring value to the analysis and will end up cluttering the map.
tmap_mode("plot")
tmap_style("white")
tm_shape(buildings) +
tm_polygons(col = 'buildingType',
palette="plasma",
alpha = 0.4,
border.col = "black",
border.alpha = 0.5,
border.lwd = 0.5,
title = "Building Type") +
# tm_shape(employers) +
# tm_dots(col = "blue", size = 0.2, alpha= 0.7) +
# tm_shape(apartments) +
# tm_dots(col = "pink", size = 0.2, alpha= 0.7) +
# tm_shape(schools) +
# tm_dots(col = "green", size = 0.5, alpha= 0.7) +
tm_shape(pubs) +
tm_dots(col = "yellow", size = 0.9, alpha= 0.7) +
tm_shape(restaurants) +
tm_dots(col = "cyan", size = 0.9, alpha= 0.7) +
tm_add_legend(title = 'Location Types',
type = 'symbol',
border.col = NA,
labels = c('Pubs', 'Restaurants', '+VE Change','Sig -VE Change','Largest -VE Change'),
col = c('yellow', 'cyan', 'magenta1','black','orangered1')) +
tm_layout(main.title = 'Map of Engagemnt City, Ohio USA',
frame = FALSE) +
tm_shape(hex_combined_mar_22 %>% filter(hex_id %in% notable_pos_deltas$hex_id)) +
tm_dots(col = "magenta1", size = 0.7, alpha= 1, shape = 18) +
tm_shape(hex_combined_mar_22 %>% filter(hex_id %in% notable_neg_deltas$hex_id)) +
tm_dots(col = "orangered1", size = 0.7, alpha= 1, shape = 18) +
tm_shape(hex_combined_mar_22 %>% filter(hex_id %in% notable_slight_neg_deltas$hex_id)) +
tm_dots(col = "black", size = 0.7, alpha= 1, shape = 18)
As seen from the output map, the overlapping dots will tell us which of the establishments are affected.
Out of the 11 pubs (yellow dots),
When analyzing the black dots (significant decrease in traffic) on the map, it is obvious that the interconnecting routes between the top sector to the central sector AND the bottom sector to the central sector, saw a significant drop in traffic. As expected in my guess, it could be due to lower human traffic during the year-end month when most people are likely to be clearing their leaves, hence not going to work.
As for the positive traffic change (pink dots), out of the 7 dots overlapping, 3 of them are on the locations of restaurants. These restaurants are located in the commercial area and more analysis would need to be done to understand why they might have benefited from the December crowd.